Heterogeneous Subset Sampling
نویسندگان
چکیده
In this paper, we consider the problem of heterogeneous subset sampling. Each element in a domain set has different probabilities of being included in a sample, which is a subset of the domain set. Drawing a sample from a domain set of size n takes O(n) time if a Naive algorithm is employed. We propose a Hybrid algorithm, which requires O(n) preprocessing time and O(n) extra space. It draws a sample in O(n √ p∗) time on average where p∗ is min (pμ, 1− pμ) and pμ denotes the mean of inclusion probabilities. In addition to the theoretical analysis, we evaluate the performance of the Hybrid algorithm via experiments and give an application for particle-based simulations on the spread of a disease.
منابع مشابه
OPTIMIZATION OF SKELETAL STRUCTURES USING IMPROVED GENETIC ALGORITHM BASED ON PROPOSED SAMPLING SEARCH SPACE IDEA
In this article, by Partitioning of designing space, optimization speed is tried to be increased by GA. To this end, designing space search is done in two steps which are global search and local search. To achieve this goal, according to meshing in FEM, firstly, the list of sections is divided to specific subsets. Then, intermediate member of each subset, as representative of subset, is defined...
متن کاملOn the Variance of Subset Sum Estimation
For high volume data streams and large data warehouses, sampling is used for efficient approximate answers to aggregate queries over selected subsets. Mathematically, we are dealing with a set of weighted items and want to support queries to arbitrary subset sums. With unit weights, we can compute subset sizes which together with the previous sums provide the subset averages. The question addre...
متن کاملRELIABILITY–BASED DESIGN OPTIMIZATION OF CONCRETE GRAVITY DAMS USING SUBSET SIMULATION
The paper deals with the reliability–based design optimization (RBDO) of concrete gravity dams subjected to earthquake load using subset simulation. The optimization problem is formulated such that the optimal shape of concrete gravity dam described by a number of variables is found by minimizing the total cost of concrete gravity dam for the given target reliability. In order to achieve this p...
متن کاملUnbiased sampling of network ensembles
Sampling random graphs with given properties is a key step in the analysis of networks, as random ensembles represent basic null models required to identify patterns such as communities and motifs. A key requirement is that the sampling process is unbiased and efficient. The main approaches are microcanonical, i.e. they sample graphs that exactly match the enforced constraints. Unfortunately, w...
متن کاملProvably Correct Algorithms for Matrix Column Subset Selection with Selectively Sampled Data
We consider the problem of matrix column subset selection, which selects a subset of columns from an input matrix such that the input can be well approximated by the span of the selected columns. Column subset selection has been applied to numerous real-world data applications such as population genetics summarization, electronic circuits testing and recommendation systems. In many applications...
متن کامل